Presentation: Tweet"The Smallest Distributed System"
Even in its smallest incarnation, a distributed system is bound to fall prey to network partitions, data arriving out of order and eventual consistency. The more data pushed through the system, the more painful it gets to ignore these pitfalls.
At Travis CI we had to learn the hard way that anything can fail at any time and that our chance of fixing it is to accept that and to rework our system to be more resilient to failure, going back to solutions that turn out to much simpler than our initial approaches but that required rethinking all parts of the application, from the code that runs tests up to the user interface that tails build logs as it's streamed from the build process.
The lessons learned are surprisingly simple and they turned Travis CI into a system that we can grow with a lot more confidence than initially. We took it from a a single, monolithic application, to a distributed system.
This is the story of our failures and what we learned from them.
Download slides